首页> 外文OA文献 >Kernel-based methods for bandit convex optimization
【2h】

Kernel-based methods for bandit convex optimization

机译:基于核的匪凸优化方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We consider the adversarial convex bandit problem and we build the first$\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret forthis problem. To do so we introduce three new ideas in the derivative-freeoptimization literature: (i) kernel methods, (ii) a generalization of Bernoulliconvolutions, and (iii) a new annealing schedule for exponential weights (withincreasing learning rate). The basic version of our algorithm achieves$\tilde{O}(n^{9.5} \sqrt{T})$-regret, and we show that a simple variant of thisalgorithm can be run in $\mathrm{poly}(n \log(T))$-time per step at the cost ofan additional $\mathrm{poly}(n) T^{o(1)}$ factor in the regret. These resultsimprove upon the $\tilde{O}(n^{11} \sqrt{T})$-regret and$\exp(\mathrm{poly}(T))$-time result of the first two authors, and the$\log(T)^{\mathrm{poly}(n)} \sqrt{T}$-regret and$\log(T)^{\mathrm{poly}(n)}$-time result of Hazan and Li. Furthermore weconjecture that another variant of the algorithm could achieve$\tilde{O}(n^{1.5} \sqrt{T})$-regret, and moreover that this regret isunimprovable (the current best lower bound being $\Omega(n \sqrt{T})$ and it isachieved with linear functions). For the simpler situation of zeroth orderstochastic convex optimization this corresponds to the conjecture that theoptimal query complexity is of order $n^3 / \epsilon^2$.
机译:我们考虑对抗性凸匪问题,并为此问题构建了第一个$ \ mathrm {poly}(T)$时间算法,并使用$ \ mathrm {poly}(n)\ sqrt {T} $-regret。为此,我们在无导数最优化文献中引入了三个新思想:(i)核方法,(ii)Bernoulliconvolutions的一般化,以及(iii)指数权重的新退火时间表(提高学习率)。我们算法的基本版本实现了$ \ tilde {O}(n ^ {9.5} \ sqrt {T})$后悔,我们证明了该算法的简单变体可以在$ \ mathrm {poly}(n每步时间\ log(T))$的时间,但要为此付出额外的$ \ mathrm {poly}(n)T ^ {o(1)} $的代价。这些结果改进了前两位作者的$ \ tilde {O}(n ^ {11} \ sqrt {T})$后悔和$ \ exp(\ mathrm {poly}(T))$时间的结果,以及$ \ log(T)^ {\ mathrm {poly}(n)} \ sqrt {T} $-regret和$ \ log(T)^ {\ mathrm {poly}(n)} $的Hazan时间结果和李此外,我们推测该算法的另一个变体可以实现$ \ tilde {O}(n ^ {1.5} \ sqrt {T})$-regret,此外,这种遗憾是无法弥补的(当前的最佳下限是$ \ Omega(n \ sqrt {T})$,并且可以通过线性函数实现)。对于零阶随机凸优化的较简单情况,这对应于最佳查询复杂度为$ n ^ 3 / \ epsilon ^ 2 $的猜想。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号